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A  TUNEABLE  PERFORMANCE  GRAMMAR 


SUMMARY 

This  Paper  describes  a  tuneable  performance  grammar 
currently  being  developed  for  speech  understanding.  It  shows  how 
attributes  of  words  are  defined  and  propagated  to  successively 
larger  phrases*  how  other  attributes  are  acquired*  how  'factors' 
reference  them  to  help  the  Parser  choose  among  competing 
definitions  in  order  to  interpret  the  utterance  correctly*  and 
how  these  factors  can  easily  be  changed  to  adapt  the  grammar  to 
other  discourses  and  contexts.  Factors  that  might  be  classified 
as  'syntactic*  are  emphasized*  but  the  attributes  they  reference 
need  not  be,  and  seldom  are,  purely  syntactic. 
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A  performance  grammar  (PG)  defines  the  form  and  meaning  of 
the  kinds  of  utterances  that  occur  in  spontaneous  dialog.  When 
the  definitions  of  the  grammar  provide  information  that  helps  a 
parser  choose  those  rules  most  likely  to  lead  to  correct 
interpretations  of  utterances^  the  grammar  is  said  to  be  'tuned'. 
When  the  tuning  is  easily  changed  when  the  domain  of  discourse 
changes,  the  grammar  is  said  to  be  'tuneable'.  The  ability  to 
tune  a  grammar  is  particularly  important  in  speech  understanding 
where  the  inherent  uncertainty  of  the  input  causes  false  paths 
through  the  grammar  to  be  multiplied. 

This  paper  describes  a  tuneable  PG  being  developed  jointly 
by  SRI  and  SDC  for  a  computer-based  speech  understanding  system. 
Its  vocabulary  and  phrase  types»  selected  from  protocols,  are 
appropriate  for  asking  and  answering  guestions  about  properties 
of  submarines.  The  PG  now  defines  over  70  word  and  phrase 
categories.  Its  scope  extends  far  beyond  syntax,  A  discourse 
component  enables  it  to  handle  anaphora  and  ellipsis,  as  in: 
"What  is  the  surface  displacement  of  the  Lafayette?,,,,  What  is 
its  draft?",  and  "What  is  the  length  of  the  Lafayette?,,,,  The 
Ethan  Allen?"  A  semantics  component  defines  a  common  meaning  for 
paraphrases,  as  in  "the  speed  of  the  Lafayette  is  30  knots"  and 
"the  Lafayette  has  a  speed  of  thirty  knots",  (See  Walker  et  al,, 
19757  Paxton  and  Robinson,  1975?  Hendrix,  1975?  Deutsch,  1975,) 

Each  definition  composing  the  PG  has  three  parts ,  The  first 
names  a  word  category  or  a  phrase  category  and  provides  a 
context-free  production  for  its  composition.  The  second  part. 
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called  'attributes ' ^  tells  how  to  determine  the  properties  of  an 
instance  of  the  categorVi  Any  definition  can  reference  multiple 
sources  of  )cnowledge-"acoustic^  syntactic#  semantic#  discourse# 
or  pragmatic— for  information  needed  to  determine  attribute 
values.  The  third  part#  'factors'#  defines  scores  for 
Combinations  of  attributes#  indicating  how  well  they  'fit'.  It 
is  through  factor  scores  that  the  grammar  is  tuned.  The 
individual  scores  are  combined  into  a  composite  score  which  is 
used  by  the  parser  to  choose  among  competing  parsings,  A 
purported  instance  of  the  definition  with  a  score  of  OUT  for  any 
factor  is  Immediately  eliminated;  a  low  score  may  eliminate  a 
parsing  path;  a  high  score  enhances  the  priority  of  a  parsing 
path  that  applies  the  definition. 

Our  mnemonic  terms  for  factor  scores  are  VERYGOQD#  GOOD,  OK, 
POOR,  BAD,  and  OUT.  These  are  estimates  of  likelihood.  They  are 
necessarily  vague,  because  we  are  dealing  with  gradual  phenomena 
and  probabilistic  tendencies.  They  mean  something  like  "quite 
likely",  "expected",  "ordinary",  "odd  but  possible", 
"unlikely— listen  again",  and  "so  special  that  we  do  not  define 
it".  Rigid,  prescriptive  judgments  are  avoided.  Combining 
"foot"  with  "-s"  as  a  Plural  noun  is  indeed  wrong  and  therefore 
OUT,  On  the  other  hand,  "fuel"  does  combine  with  plural  "-s" 
with  the  specialized  meaning  "kinds  of  fuel".  At  present, 
"fuels",  like  "foots"  is  judged  to  be  OUT  for  our  language,  but 
this  judgment  can  easily  be  altered,  if  we  find  that  our  language 
users  refer  to  kinds  of  fuel  as  "fuels". 
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Since  factor  scores  can  be  changed  without  affecting  the 
rest  of  the  definition#  the  grammar  is  tuneable  to  different 
discourse  domains  and  styles  of  speaking.  Also#  if  one  factor 
defines  a  low  score  for  an  instantiation,  others  may  still  raise 
the  composite  score,  A  statistically  improbable  phrase  that 
makes  sense  and  is  uttered  intelligibly  should  not  be  unduly 
difficult  to  recognize  and  Interpret, 

The  rest  of  this  paper  examines  sequences  of  definitions 
required  for  parsing  and  understanding  a  typical  utterance,  We 
begin  with  word  definitions#  and  show  how  the  attributes  of  words 
are  propagated  to  successively  larger  phrases#  how  other 
attributes  peculiar  to  higher-level  phrases  are  added#  and  how 
factors  reference  them  in  tuning  the  grammar,. 

Preceding  discourse  and  underlying  semantic  distinctions 
constrain  the  surface  syntax  of  an  utterance.  Because 
superficial  syntactic  properties  signal  those  constraints#  it  is 
often  economical  to  use  syntactic  factors  in  order  to  disconfirm 
a  wrong  parsing  path  or  confirm  a  correct  one#  avoiding  calls  on 
semantics#  discourse#  and  acoustics  for  expensive  in-depth 
evaluations.  For  example#  if  someone  says  "fuel  supplies"#  we  do 
not  want  the  parser  to  explore  in  depth  the  application  of  rules 
that  build  a  plural  noun-phrase  from  "fuel  s.,."  without 
considering  an  alternative  definition  in  which  "fuel"  is  a 
modifier  of  a  countable  nominal  beginning  with  "s".  To  this  end, 
we  include  a  factor  that  checks  the  countableness  of  "fuel"  by 
referencing  a  count/mass/unlt  (CMU)  attribute#  which  is  syntax 
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oriented  but  essentially  semantics  based. 

Examples  of  some  useful  syntax-oriented  attributes  defined 
for  the  word  category  N  (noun  stem)  appear  in  Cl)  below.  Every  N 
has  a  Value  for  the  CMU  attribute  drawn  from  the  set  (COUNT  MASS 
UNIT),  Ns  with  the  CMU  value  UNIT  (such  as  "foot",  "ton", 
"knot")  combine  easily  with  Plural  suffixes  and  number 
expressions  (e.g,,  "two  Icnots",  "five  feet"),  but  not  so  well 
With  definite  determiners  ("those  two  )cnots"),  or  genitive 
suffixes  ("the  twenty  knots'  speed"),  CCf,  "the  Ethan  Allen's 
speed" , ) 


(1) 


WORDS, DEF  N 

FUEL 
FOOT 

LAFAYETTE 

SURF ACE, DISPLACEMENT 
TON 


CMU  s  (MAS5)j 
CMU  =  (UNIT), 
CMU  =  (COUNT) 
CMU  s  (COUNT) 
CMU  s  (UNIT); 


PLSUFF  3  NO; 
RELN  s  T: 


Like  the  CMU  attribute,  the  RELN  attribute  is  essentially 
semantic.  It  marks  such  words  as  "surface  displacement", 
"speed",  "length",  and  "draft"  as  special  'relational'  noun 
words.  Syntactically,  relational  Ns  do  not  combine  readily  with 
plural  suffixes  and  number  expressions,  and  when  they  do,  the 
meaning  is  specialized.  To  some  degree,  they  are  like  mass  Ns; 
"three  speeds"  (three  rates  of  speed)  is  analogous  to  "three 
fuels"  (three  kinds  of  fuel),  Howeverr  "a  speed  of  twenty  knots" 
is  acceptable,  while  "a  fuel  of  two  tons"  is  ill  formed, 

The  attribute  PLSUFF  distinguishes  irregular  plurals  like 
"foot".  Unlike  the  CMU  and  RELN  attributes,  it  is  purely 
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syntactic. 

Attributes  affecting  the  ability  to  combine  with  the  plural 
suffix  "-s"  are  referenced  in  the  two  composition  rules  of  C2)f 
defining  the  category  NOUN,  The  attribute  statements  propagate 
the  attributes  of  the  stem,  adding  a  number  attribute  (NBR).  The 
first  factor  of  N1  references  the  CMU  attribute  and  states  that 
if  the  value  is  mass,  then  the  score  is  GOOD,  This  judgment 
incorporates  our  knowledge  that  the  other  rule,  N2,  cannot  apply 
to  mass  noun-stems.  If  the  token  is  a  mass  noun-stem,  Nl  is  the 
right  composition  rule  to  apply. 


(2)  RULE.DEF  Nl  NOUN  s  N; 

ATTRIBUTES 

CMU,RELN,PLSUFF  FROM  N,  NBR  s  "(SG); 

FACTORS 

CMU  *  IF  CMU  EQ  "(MASS)  THEN  GOOD  ELSE  OK, 

RELN  s  IF  RELN  EQ  "T  THEN  GOOD  ELSE  OK, 

PLSUFF  =  IF  PLSUFF  EQ  "NO  THEN  GOOD  ELSE  0K| 
EXAMPLES 

SURFACE  DISPLACEMENT,  FOOT,  FUEL  (GOOD) 
SUBMARINE  (OK)? 

RULE.DEF  N2  NOUN  s  N  -PL? 

ATTRIBUTES 

CMU, RELN, PLSUFF  FROM  N,  NBR  =  "(PL)? 

FACTORS 

PLSUFF  =  IF  PLSUFF  EQ  "NO  THEN  OUT  ELSE  OK, 

CMU  s  IF  CMU  EQ  "(MASS)  THEN  OUT  ELSE  OK, 

UNIT  =  IF  "UNIT  IN  CMU  THEN  GOOD  ELSE  OK, 

RELN  s  IF  RELN  EQ  "T  THEN  POOR  ELSE  OK? 
EXAMPLES 

FOOT  -S,  FUEL  -S  (OUT) ,  TONS  (GOOD) 

SURFACE  DISPLACEMENTS  (POOR),  SUBMARINES  (OK)? 


Like  the  CMU  factors,  the  PLSUFF  factors  enhance  the  score 
for  applying  Nl  to  stems  that  do  not  take  a  plural  suffix  and 
constrain  N2  not  to  apply,  A  RELN  factor  enhances  the  score  when 
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Nl  is  applied  to  a  relational  noun  stem#  and  lowers  the  score 
when  N2  Is  appliedt  Plurals  of  such  stems  are  less  likely  than 
singulars#  but  are  Possible,  The  UNIT  factor  of  N2  enhances  the 
score  when  the  composition  rule  applies  to  an  N  with  the  value 
UNIT  in  its  CMU  attribute.  This  judgment  is  based  on  the  fact 
that#  in  our  current  tasjc  domain#  an  the  measured  properties 
have  measurements  exceeding  one  unit  and  on  the  reasonable 
expectation  that  exactly  one  unit  is  a  special  case. 

The  attributes  of  the  Ns  continue  to  be  propagated  through 
successive  composition  rules  so  that  noun  phrases  acquire  the 
attributes#  with  some  exceptions  and  additions#  of  the  Ns  that 
are  their  heads.  One  of  the  additional  attributes  of  NPs  is 
focus:  a  noun  phrase  is  definite  (DEF)  or  indefinite  (INDEF), 

Combining  definite  focus  with  a  unit  is  unusual.  Compared 
with  "which  submarine"#  "which  seven  thousand  tons"  seems  odd,  as 
does  "those  twenty  knots"  and  "a  draft  of  the  five  feet". 
Indefinite  focus  is  more  common  for  units:  "a  ton"#  "a  draft  of 
five  feet"#  "twenty  knots".  It  does  not  suggest  a  uniquely 
determinable  object  or  set  of  objects#  pointed  to  in  the 
discourse, 

HOW  these  syntactic  tendencies  are  handled  in  three 
composition-rule  definitions  is  shown  in  (3),  Factors  in  NP4 
eliminate  expressions  like  "five  fuels"#  "five  submerged  speeds 
of  three  knots"#  "how  much  submarine"#  "one  submarines"  and  "how 
many  fuel".  They  score  expressions  like  "five  feet"  aS  YERYGOOD 
and  "five  submarines"  as  OK,  Factors  for  NP7  eliminate  "those 
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submarine",  "those  fuels"  and  accept  "what  fuel"  as  OK,  while 
"which  tons"  and  "that  draft  of  five  feet"  are  POOR,  Factors  for 
NPll  eliminate  "a  fuel",  "a  draft  of  the  Lafayette",  and  "a 
submarines"?  accept  "a  submarine",  "a  ton",  "the  submarine",  and 
"the  submerged  speed",  and  score  "the  ton"  and  "the  draft  of  five 
feet"  as  POOR, 


(3)  RULE.DEF  NP4  NP  s  NUMBER?  NOM? 

ATTRIBUTES 

FOCUS  s  "INDEF,  MOOD,NUM  FROM  NUMBER?,  RELN  FROM  NOM, 
NBR  =  GINTERSECT(NBRCNUMBERP) ,NBB(NOM) ) , 

CMU  s  GINTERSECT(CMU(NUMBERP),CMU(NOM))j 
FACTORS 

CMU  =  IF  NULL  CMU  THEN  OUT  ELSE  OK, 

HUN  =  IF  FSTWD(NUMBERP)  IN  "(HUNDRED  THOUSAND  MILLION) 
THEN  OUT  ELSE  OK, 

NBR  c  IF  NULL  NBR  THEN  OUT  ELSE  OK, 

unit  s  if  "UNIT  IN  CMU  THEN  VERYGOOD  ELSE  OK, 

RELN  s  IF  RELN  EQ  T  THEN  OUT  ELSE  OK? 

RULE.DEF  NP7  NP  s  DET  NOM? 

ATTRIBUTES 

FOCUS  =  "DEF,  RELN  FROM  NOM,  MOOD  FROM  DET, 

CMU  c  GINTERSECT(CMU(DET),CMU(N0M)), 

NBR  s  GINTERSECT(NBR(DET) ,NBR(N0M) ) ? 

FACTORS 

CMU  =  IF  NULL  CMU  THEN  OUT  ELSE  OK, 

UNIT  5  IF  "UNIT  IN  CMU  THEN  POOR  ELSE  OK, 

NBR  s  IF  NULL  NBR  THEN  OUT  ELSE  OK? 

RULE.DEF  NPll  NP  =  ART  NOM? 

ATTRIBUTES 

RELN  FROM  NOM,  FOCUS  FROM  ART,  MOOD  =  "DEC, 

CMU  =  GINTERSECTCCMU(ART),CMU(NOM)), 

NBR  =  GINTERSECT(NBR(ART),NBRCNOM)) ? 

FACTORS 

CMU  e  IF  NULL  CMU  THEN  OUT  ELSE  OK, 

NBR  =  IF  NULL  NBR  THEN  OUT  ELSE  OK, 

UNIT  s  IF  "UNIT  IN  CMU  AND  FOCUS  EO  "DEF 
THEN  POOR  ELSE  GOOD, 

RELN  5  IF  RELN  EQ  T  AND  FOCUS  EQ  "INDEF  AND 
CMU  EQ  "(COUNT)  THEN  OUT  ELSE  OK? 


In  each  definition,  a  UNIT  factor  references  the  CMU 
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attribute  of  the  NP,  If  the  value  is  NIL»  the  definition  is  not 
applicable.  If  UNIT  is  a  value»  then  the  UNIT  factor  for  NP4 
scores  the  application  as  VERYGOOD,  There  are  two  reasons  for 
this  judgment.  Number  expressions  are  typically  found  with  unit 
expressions  to  form  measure  expressions#  and  units  are  more 
likely  to  occur  with  indefinite  than  with  definite  focus#  as  the 
preceding  examples  ("twenty  knots"  and  so  on)  have  indicated. 

Since  the  focus  for  NP7  is  always  definite#  the  UNIT  factor 
decreases  the  score  for  applying  it  when  the  UNIT  value  appears 
in  the  CMU  attribute.  For  NPU#  the  UNIT  factor  scores  the 
application  GOOD  if  the  article  is  "a"  and  UNIT  appears  in  the 
CMU  values#  but  POOR  if  the  article  is  "the", 

NP4  applies  especially  wen  to  instances  in  which  units  are 
present#  but  does  not  apply  at  all  if  the  head  of  the  nominal 
constituent  is  a  RELN  stem.  In  discourse  about  washing  machines 
and  bicycles#  "three  speeds"  might  occur  in  an  ordinary  way#  but 
for  our  current  discourse#  we  do  not  anticipate  such  a 
combination.  Certainly#  we  do  not  expect  "three  surface 
displacements", 

such  constraints  relieve  the  need  for  detailed  analysis. 
For  instance#  assume  that  the  acoustic  mapper  has  tentatively 
offered  both  "submarine"  and  "submerged  speed"  as  acoustically 
plausible  alternatives  for  filling  the  gap  in  the  partially 
analyzed  phrase  "three  -s  of  the  U,S,  Navy",  This  is  not 
improbable  since  "submarines"  and  "submerged  speeds"  resemble 
each  other  in  many  ways.  They  both  start  with  "s"j 


their  first 
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syllables  have  central  vowels;  their  last  syllables  have  high 
front  vowels#  and  so  forth#  If  NP4  is  to  be  applied#  however# 
the  RELN  factor  win  resolve  the  doubt  in  favor  of  "submarine"# 
and  there  will  be  no  need  to  test  in  depth  how  well  "submerged 
speed"  maps  onto  the  acoustic  data  or  fits  the  semantic  and 
discourse  constraints. 

The  unit  factor  of  NPll  guides  the  choice  between  "a"  and 
"the"#  Where  acoustic  evidence  for  a  choice  is  typically  lacking. 
Semantically#  "a"  resembles  "one"  in  its  ability  to  combine  with 
numbers  and  units;  e.g,#  "one  ton"#  "a  ton"#  "one  hundred"#  "a 
hundred".  If  the  Instance  of  the  NOM  is  "ton"#  "foot"#  "knot"# 
or  some  other  singular  expression  with  the  value  UNIT  for  its  CMU 
attribute#  then  "a"  is  judged  to  be  more  likely  than  "the".  On 
the  other  hand#  if  the  NOM  is  "fuel"  or  "submarines",  the  article 
cannot  be  "a".  The  CMU  attribute  for  "a"  is  (COUNT  UNIT)#  which 
does  not  intersect  with  the  value  (MASS)  of  the  CMU  attribute  for 
"fuel";  the  NBR  attribute  is  (SG)#  which  does  not  intersect  with 
the  value  (PL)  for  "submarines".  The  factors  referencing  these 
attributes  rule  out  application  when  the  Intersection  is  NIL, 
These  are  typical  syntactic  agreement  tests. 

As  longer  phrases  are  built  up#  the  various  attributes 
interact  in  other  ways.  For  Instance#  the  syntactic  properties 
Of  relational  expressions  depend  on  which  aspect  of  the  relation 
is  present  in  an  accompanying  prepositional  phrase. 
Prepositional  phrases  have  the  attributes  of  their  NP  objects. 
When  a  prepositional  phrase  modifies  a  noun  with  the  RELN 
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attribute^  the  CMU  attribute  for  the  resultant  phrase  is  defined 
by  taking  the  union  of  the  values  for  the  two  nominal 
constituents.  As  a  resultf  phrases  like  "surface  displacement  of 
the  Lafayette"  have  the  Value  (COUNT)  and  those  like  "surface 
displacment  of  seven  thousand  tons"  have  the  value  (COUNT  UNIT), 
The  difference  in  values  marks  the  fact  that  the  two  examples  do 
not  fit  with  equal  ease  in  all  syntactic  environments.  It  is 
referenced  in  the  UNIT  and  RELN  factors  in  (3)  above#  to 
influence  the  choice  between  the  two  articles#  which  are  seldom 
distinguished  clearly  by  sound.  The  rule  is  tuned  to  prefer 
"the"  in  the  absence  of  the  UNIT  value;  as  in  "the  surface 
displacement  of  the  Lafayette"#  and  "a"  when  it  is  present#  as  in 
"a  surface  displacement  of  seven  thousand  tons",  "A  surface 
displacment  of  the  Lafayette"#  which  implies  the  possibility  of 
having  more  than  one  surface  displacement#  is  ruled  out 
completely, 

NPs  also  have  a  MOOD  attribute#  derived  from  their  initial 
constituents.  It  is  either  declarative  (DEC)  as  in  "this 
submarine"#  or  WH*lnterrogative  (WH)  as  in  "which  submarine". 
The  WH  Value  is  propagated  to  the  larger  Phrases  in  which  NPs  are 
constituents.  Sentences  (S)  and  utterances  CU)  take  the  value 
for  their  MOOD  attribute  from  an  initial  NP,  Our  current 
vocabulary  does  not  include  verbs  like  "know"  and  "tell"#  which 
can  embed  WH  questions  like  "Do  you  know  what  the  surface 
displacement  is?"  For  the  time  being,  we  assume  that  noninltial 
noun  phrases  are  not  likely  to  have  the  value  WH  for  MOOD,  Echo 
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questions^  e,g,f  "You  said  what?”  are  not  ruled  out»  but  have 
lower  scores. 

The  convergence  of  many  attributes  at  the  higher  levels  of 
Phrase  composition  maices  possible  many  discriminatory  judgments. 
Some  of  them  are  shown  in  (4). 


(4)  RULE.DEF  S3  S  s  NP;NPl  AUXB  NP:NP2; 

ATTRIBUTES 

MOOD, FOCUS, CMU,REliN  FROM  NPl, 

AFFNEG  FROM  AUXB » 

FACTORS 

NBRAGRl  =  IF  CMU  EQUAL  ’’(UNIT)  THEN 

(IF  NBRCAUXB)EQUAL  "(SCjlHEN  OK  ELSE  OUTJELSE 
IF  GINTERSECTCNBR(NP15,NBR(AUXB))THEN  OK  ELSE  OUT, 
NBBAGR2  s  IF  CMU(NP2)  EQUAL  "(UNIT)  THEN  OK  ELSE 

IF  GINTERSECT(NBR(NP2),NBR{AUXB))THEN  OK  ELSE  OUT, 
FOCUS  =  IF  FaCUS(NPn  EQ  "INDEF  AND  FaCUS(NP2)  EQ  "DEF 
THEN  POOR  ELSE  OK, 

GCASEl  =  IF  GCASE(NPl)  EQUAL  "(ACC)  THEN  OUT  ELSE  OK, 
GCA5E2  =  IF  GCASE(NP2)  EQUAL  "(ACC)  THEN  OUT  ELSE  OK, 
MOODl  =  IF  MOOD  EQUAL  "(WH)  THEN  GOOD  ELSE  OK, 

MOOD2  s  IF  MOOD  EQUAL  "(WH)  AND  MOOO(NP2)  EQUAL  "(WH) 
THEN  POOR  ELSE  OK, 

AFFNEG  =  IF  MOOD  EQUAL  "(WH)  AND  AFFNEG  EQ  "NEG  THEN 
BAD  ELSE  OK, 

RELN  s  IF  RELN  EQ  "T  AND  CMU(NP2)  EQUAL  "(UNIT) 

THEN  VERYGOOD  ELSE  OK, 

PERSAGR  =  IF  GINTERSECTCPERS(NPl),pERS(AUXB)) 

THEN  OK  ELSE  OUT^ 


EXAMPLES 

THE  LAFAYETTE  IS  A  SUBMARINE  (OK) 

THE  LAFAYETTE  IS  SUBMARINES,  WHAT  IS  THEM  (OUT) 

A  LAFAYETTE  IS  THE  SUBMARINE  (POOR) 

THEM  ARE  SUBMARINES,  IT  AM  A  SHIP  (OUT) 

WHAT  IS  IT,  WHAT  IS  THE  LENGTH  (GOOD) 

HOW  MANY  ARE  WHAT  (POOR)  > 

WHAT  ISN'T  THE  SURFACE  DISPLACEMENT  (BAD) 

THE  SURFACE  DISPLACEMENT  IS  7000  TONS  (VERYGOOD); 


The  PERSAGR  (person»agreement )  factor  tests  for  agreement 
between  the  so-called  pronouns  and  the  auxiliary  constituent. 
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The  two  grammatical  case  factors^  GCASEl  and  GCASE2#  require  that 
the  grammatical  cases  of  the  two  NPs  are  not  accusative.  These 
traditional  syntactic  agreement  tests  block  application  of  the 
composition  rule  to  putative  expressions  like  "it  are"  and  "they 
is",  "Them  is"  is  doubly  blocked. 

Some  of  the  remaining  factor  statements  in  (4)  are  less 
traditional.  One  of  these  is  the  AFFNEg  factor#  which  references 
both  the  MOOD  and  AFFNEG  attributes  and  reduces  the  score  greatly 
If  the  instance  is-  purportedly  a  negative  Wh  question  like  "what 
isn't  the  surface  displacement?"  Genuine  requests  for  negative 
information  occur  In  highly  circumscribed  situations.  The 
rhetorical  question  is  not  a  genuine  request  for  information 
(e.g.#  "Who  wouldn't  like  to  be  rich  and  famousl").  "Who  isn't 
here?"  is  reasonable  only  if  there  is  an  established  and  limited 
list  of  people  who  are  expected  to  be  present,  as  in  a  classroom, 
"What  isn't  your  name?"  and  "Where  don't  you  live?"  are  patently 
absurd. 

The  constraint  on  negative  WH  questions  is  essentially  due 
to  pragmatic  forces  as  well  as  semantic  ones.  Similar  forces  are 
at  work  in  observed  tendencies  for  the  first  NP  in  the 
composition  defined  by  S3  to  be  indefinite  In  focus  only  when  the 
second  one  Is  also,,  stated  oversimply,  in  coherent  discourse, 
the  things  already  talked  about»-the  "old"  Inf ormation--tends  to 
come  first.  What  is  predicated  about  it»-the  "new" 
in£ormatlon--tends  to  follow.  Old  information  is  Information 
that  has  already  been  talked  about  and  established  in  the 
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discourse^  so  that  it  is  liJcely  to  be  encoded  in  definite  noun 
phrases.  These  are  likely  to  be  in  subject  position^  so  that  the 
sentence  they  introduce  is  consistent  v?ith  preceding  sentences. 
New  information  tends  to  be  introduced  in  indefinite  noun 
phrases.  The  next  mention  of  the  "same  thing"  will  then  be  old 
informationf  eligible  for  definite  focus.  Consequently#  "A 
Lafayette  Is  that  submarine"  seems  peculiar,  relative  to  "That 
submarine  is  a  Lafayette",  "A  Lafayette  is  it"  is  still  more 
peculiar.  These  discourse-based  probabilistic  tendencies  are 
expressed  in  the  FOCUS  factor  of  S3, 

The  Cmu  attribute,  as  previously  noted,  is  not  purely 
syntactic.  On  the  other  hand,  matters  like  number  agreement  have 
always  been  central  to  syntax.  It  is  particularly  interesting, 
therefore,  that  the  number  agreement  constraints  for  53  cannot  be 
properly  stated  without  appealing  to  CMU,  To  state  number 
agreement  constraints.  Ns  denoting  units  must  be  marked 
Separately,  Sentences  like  "These  are  a  submarine",  "These  is  a 
torpedo  tube",  "These  is  missile  launchers",  and  "This  are  subs" 
are  clearly  ungrammatical,  and  the  ungrammatlcallty  is  usually 
attributed  to  the  fact  that  one  of  the  constituents  differs  in 
grammatical  number  from  the  other  two.  However,  "The  surface 
displacement  is  seven  thousand  tons"  is  wholly  grammatical  even 
though  two  of  the  constituents  are  singular  and  the  third  is 
plural.  Such  use  of  semantic  attributes  in  syntactic  factors 
points  to  the  conclusion  that  the  integration  of  information  from 
different  sources  of  knowledge  is  well  motivated  on  both 
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linguistic  and  heuristic  grounds. 

Because  of  the  high  frequency  of  WH  questions  In  the 
protocols  from  which  the  vocabulary  and  phrase  types  were 
selected/  the  PG  Is  now  tuned  to  expect  them,  A  sentence  defined 
by  S3  receives  a  higher  score  from  the  MOODl  factor  if  its  HOOD 
Is  WH,  This  tuning  can  easily  be  changed  without  altering  the 
syntax  or  semantics  of  the  language.  If  the  user  both  extracts 
data  from  the  data  base  and  enters  data  into  It/  with  no 
predictable  pattern  of  alternation/  factors  like  MOODl  can  simply 
be  removed,  A  more  interesting  alternative  is  to  reset  them 
dynamically  in  a  discourse  context  where  the  computer  sometimes 
asks  questions  for  the  user  to  answer.  After  each  user  question/ 
the  grammar  could  be  tuned  to  expect  a  declarative  utterance 
whose  syntax  and  semantics  were  appropriate  and  relevant. 
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